Ola-7B is a multimodal large language model jointly developed by Tencent, Tsinghua University, and Nanyang Technological University. Based on the Qwen2.5 architecture, it supports processing text, image, video, and audio inputs and generates text outputs.
Multimodal Fusion Supports Multiple Languages